# load the tidyverse library (which includes dplyr and ggplot2)
library(tidyverse) # or library(ggplot2) and library(dplyr)
# load the gapminder dataset for this lesson
gapminder <- read.csv("data/gapminder_data.csv")Data visualization with ggplot2
Ggplot2 is built on the grammar of graphics which builds plots in layers.
Let’s start off with an example:
# use ggplot to initialize a plot of gapminder's gdpPercap (x) and lifeExp (y)
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
# add a scatterplot (points) layer
geom_point()The two top-level functions we have used are ggplot() and geom_point().
Notice the use of + to add a layer.
ggplot():
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))This function lets R know that we’re creating a new plot, and any of the arguments we give the ggplot function apply to all layers of our plot.
We’ve passed in two arguments to ggplot:
data = gapminder: tellsggplotwhat data we want to show on our figuremapping = aes(x = gdpPercap, y = lifeExp): tellsggplothow variables in the data should map to aesthetic properties (e.g., the x and y coordinates).
geom_point()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()geom_point() adds a scatterplot using the data and global aesthetics we specified in ggplot().
Captions via code chunk options
You can add a caption to a figure in a quarto document by supplying a label and fig-cap quarto chunk option:
gapminder |>
filter(year == 2007) |>
ggplot(mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point()Let’s compile our document to check that a caption appeared for this figure!
Line plots
Let’s try to visualize life expectancy for each country over time, coloring our lines by continent
# use ggplot() to plot lifeExp versus year as a line plot
# and try to color the lines by continent
ggplot(gapminder, aes(x = year, y = lifeExp, color = continent)) +
geom_line()Our plot looks strange… what’s going on in this plot?
We haven’t told ggplot that we want a separate line for each country.
We can do that by adding a group argument inside the aes() function:
# use ggplot() to plot lifeExp versus year as a line plot and group by country
# and try to color the lines by continent
ggplot(gapminder, aes(x = year, y = lifeExp, color = continent, group = country)) +
geom_line()Multiple geom layers
We can visualize both lines and points on the same plot by adding multiple geom_() layers:
# Add a points layer to the line plot above
ggplot(gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) +
geom_line() +
geom_point()Supplying local layer aesthetics
In the example above, the aesthetics (from aes()) are applied to both layers.
To apply an aesthetic just to one layer, you can supply a separate aes() function to the layer:
# recreate the plot above but apply the color aesthetic just to the lines layer
ggplot(gapminder, aes(x = year, y = lifeExp, group = country)) +
geom_line(aes(color = continent)) +
geom_point()The order of the layers
Each layer is drawn on top of the previous layer. What happens if we switch the order of the layers?
# Rewrite the code above but with the points layer and line layer in the opposite order
ggplot(gapminder, aes(x = year, y = lifeExp, group = country)) +
geom_point() +
geom_line(aes(color = continent)) Setting aesthetics to a unfiform (non-data) value
To change the aesthetic of all lines/points to a value that is not dictated by the data, you may think that aes(color="blue") should work, but it doesn’t.
Let’s try to set the color of our lines to “blue”:
# create the same line plot as above of year vs lifeExp for each country,
# but try to set the color of all of the lines to "blue" inside aes():
ggplot(gapminder) +
geom_line(aes(x = year, y = lifeExp, group = country, color = "blue")) When setting an aesthetic to a value that does not correspond to a variable from our data, we need to move the color specification outside of the aes() function:
# fix the above code by moving the `color` argument outside `aes()`
ggplot(gapminder) +
geom_line(aes(x = year, y = lifeExp, group = country), color = "blue") Transparency
Another aesthetic value that is helpful is adding transparency using alpha:
# Add transparency (alpha = 0.2) to the previous line plot of year vs life exp
ggplot(gapminder) +
geom_line(aes(x = year, y = lifeExp, group = country),
color = "blue", alpha = 0.2) Transformations
Recall our scatterplot of gdpPercap vs lifeExp (this time with transparency):
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5)Let’s add a scale layer to present the x-axis on a log10 scale:
# add a log-10 scale for the x-axis from the previous plot
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) +
scale_x_log10()Adding a linear fit
We can also fit a simple relationship to the data by adding another layer, geom_smooth():
# add a lm smooth layer to the previous plot
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) +
scale_x_log10() +
geom_smooth(method = "lm")`geom_smooth()` using formula = 'y ~ x'
Try changing the linewidth using the linewidth argument.
Multi-panel figures
Earlier we visualized the change in life expectancy over time across all countries in one plot like this:
ggplot(gapminder, aes(x = year, y = lifeExp, color = continent, group = country)) +
geom_line()Another way to view this data is to create a separate plot for each continent.
One way to do this would be to create a separate plot for each continent manually. For example:
# create a line plot for the countries in the Americas only
gapminder |>
filter(continent == "Americas") |>
ggplot(aes(x = year, y = lifeExp, group = country)) +
geom_line()# create a line plot for all of countries in Europe only
gapminder |>
filter(continent == "Europe") |>
ggplot(aes(x = year, y = lifeExp, group = country)) +
geom_line()But there is a more efficient way to do this using facet_wrap():
# create a grid of line plots of year vs lifeExp for the countries in each continent
# using facet_wrap()
ggplot(gapminder, aes(x = year, y = lifeExp, group = country)) +
geom_line() +
facet_wrap(~continent)Modifying labels
You can add labels to plots using the labs() function.
gapminder |>
filter(country == "Brazil") |>
ggplot() +
geom_line(aes(x = year, y = lifeExp)) # Add reasonable labels to the plot above
labs(x = "Year",
y = "Life expectancy",
title = "Life expectency by year in Brazil")$x
[1] "Year"
$y
[1] "Life expectancy"
$title
[1] "Life expectency by year in Brazil"
attr(,"class")
[1] "labels"
Built-in themes
There are several themes for making your plots even prettier. For example,
theme_classic():
# add theme_classic() to the following plot
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(size = pop, color = continent)) +
scale_x_log10() +
theme_classic()theme_minimal():
# add theme_minimal() to the following plot
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(size = pop, color = continent)) +
scale_x_log10() +
theme_minimal()theme_bw():
# add theme_bw() to the following plot
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(size = pop, color = continent)) +
scale_x_log10() +
theme_bw()